Goto

Collaborating Authors

 question-answering system


Unlocking the Potential of Multiple BERT Models for Bangla Question Answering in NCTB Textbooks

arXiv.org Artificial Intelligence

Evaluating text comprehension in educational settings is critical for understanding student performance and improving curricular effectiveness. This study investigates the capability of state-of-the-art language models--RoBERTa Base, Bangla-BERT, and BERT Base--in automatically assessing Bangla passage-based question-answering from the National Curriculum and Textbook Board (NCTB) textbooks for classes 6-10. A dataset of approximately 3,000 Bangla passage-based questionanswering instances was compiled, and the models were evaluated using F1 Score and Exact Match (EM) metrics across various hyperparameter configurations. Our findings revealed that Bangla-BERT consistently outperformed the other models, achieving the highest F1 (0.75) and EM (0.53) scores, particularly with smaller batch sizes, the inclusion of stop words, and a moderate learning rate. In contrast, RoBERTa Base demonstrated the weakest performance, with the lowest F1 (0.19) and EM (0.27) scores under certain configurations. The results underscore the importance of fine-tuning hyperparameters for optimizing model performance and highlight the potential of machine learning models in evaluating text comprehension in educational contexts. However, limitations such as dataset size, spelling inconsistencies, and computational constraints emphasize the need for further research to enhance the robustness and applicability of these models. This study lays the groundwork for the future development of automated evaluation systems in educational institutions, providing critical insights into model performance in the context of Bangla text comprehension.


Domain-specific Question Answering with Hybrid Search

arXiv.org Artificial Intelligence

With the increasing adoption of Large Language Models A production-ready, generalizable framework for LLMbased (LLMs) in enterprise settings, ensuring accurate and reliable QA systems built on Elasticsearch question-answering systems remains a critical challenge. A flexible hybrid retrieval mechanism combining dense Building upon our previous work on domain-specific and sparse search methods question answering about Adobe products (Sharma et al. A comprehensive evaluation framework for assessing 2024), which established a retrieval-aware framework with QA system performance self-supervised training, we now present a production-ready, Empirical analysis demonstrating the effectiveness of our generalizable architecture alongside a comprehensive evaluation approach across various metrics methodology. Our core contribution is a flexible, scalable framework built on Elasticsearch that can be adapted Through this work, we provide not only theoretical insights for any LLM-based question-answering system. This framework but also a practical, deployable solution for building reliable seamlessly integrates hybrid retrieval mechanisms, domain-specific question-answering systems that can combining dense and sparse search with boost matching, be adapted to various enterprise needs.


Advancements and Challenges in Bangla Question Answering Models: A Comprehensive Review

arXiv.org Artificial Intelligence

The domain of Natural Language Processing (NLP) has experienced notable progress in the evolution of Bangla Question Answering (QA) systems. This paper presents a comprehensive review of seven research articles that contribute to the progress in this domain. These research studies explore different aspects of creating question-answering systems for the Bangla language. They cover areas like collecting data, preparing it for analysis, designing models, conducting experiments, and interpreting results. The papers introduce innovative methods like using LSTM-based models with attention mechanisms, context-based QA systems, and deep learning techniques based on prior knowledge. However, despite the progress made, several challenges remain, including the lack of well-annotated data, the absence of high-quality reading comprehension datasets, and difficulties in understanding the meaning of words in context. Bangla QA models' precision and applicability are constrained by these challenges. This review emphasizes the significance of these research contributions by highlighting the developments achieved in creating Bangla QA systems as well as the ongoing effort required to get past roadblocks and improve the performance of these systems for actual language comprehension tasks.


A Russian Jeopardy! Data Set for Question-Answering Systems

arXiv.org Artificial Intelligence

Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.


Question-Answering System for Bangla: Fine-tuning BERT-Bangla for a Closed Domain

arXiv.org Artificial Intelligence

Question-answering systems for Bengali have seen limited development, particularly in domain-specific applications. Leveraging advancements in natural language processing, this paper explores a fine-tuned BERT-Bangla model to address this gap. It presents the development of a question-answering system for Bengali using a fine-tuned BERT-Bangla model in a closed domain. The dataset was sourced from Khulna University of Engineering \& Technology's (KUET) website and other relevant texts. The system was trained and evaluated with 2500 question-answer pairs generated from curated data. Key metrics, including the Exact Match (EM) score and F1 score, were used for evaluation, achieving scores of 55.26\% and 74.21\%, respectively. The results demonstrate promising potential for domain-specific Bengali question-answering systems. Further refinements are needed to improve performance for more complex queries.


R2GQA: Retriever-Reader-Generator Question Answering System to Support Students Understanding Legal Regulations in Higher Education

arXiv.org Artificial Intelligence

In this article, we propose the R2GQA system, a Retriever-Reader-Generator Question Answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art natural language understanding algorithms to comprehend the retrieved documents and extract answers. Finally, the Generator module synthesizes the extracted answers into concise and informative responses to questions of students regarding legal regulations. Furthermore, we built the ViRHE4QA dataset in the domain of university training regulations, comprising 9,758 question-answer pairs with a rigorous construction process. This is the first Vietnamese dataset in the higher regulations domain with various types of answers, both extractive and abstractive. In addition, the R2GQA system is the first system to offer abstractive answers in Vietnamese. This paper discusses the design and implementation of each module within the R2GQA system on the ViRHE4QA dataset, highlighting their functionalities and interactions. Furthermore, we present experimental results demonstrating the effectiveness and utility of the proposed system in supporting the comprehension of students of legal regulations in higher education settings. In general, the R2GQA system and the ViRHE4QA dataset promise to contribute significantly to related research and help students navigate complex legal documents and regulations, empowering them to make informed decisions and adhere to institutional policies effectively. Our dataset is available for research purposes.


MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering

arXiv.org Artificial Intelligence

Question-answering systems have revolutionized information retrieval, but linguistic and cultural boundaries limit their widespread accessibility. This research endeavors to bridge the gap of the absence of efficient QnA datasets in low-resource languages by translating the English Question Answering Dataset (SQuAD) using a robust data curation approach. We introduce MahaSQuAD, the first-ever full SQuAD dataset for the Indic language Marathi, consisting of 118,516 training, 11,873 validation, and 11,803 test samples. We also present a gold test set of manually verified 500 examples. Challenges in maintaining context and handling linguistic nuances are addressed, ensuring accurate translations. Moreover, as a QnA dataset cannot be simply converted into any low-resource language using translation, we need a robust method to map the answer translation to its span in the translated passage. Hence, to address this challenge, we also present a generic approach for translating SQuAD into any low-resource language. Thus, we offer a scalable approach to bridge linguistic and cultural gaps present in low-resource languages, in the realm of question-answering systems. The datasets and models are shared publicly at https://github.com/l3cube-pune/MarathiNLP .


Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

arXiv.org Artificial Intelligence

AI-powered question-answering (Q&A) systems have emerged as important tools, alongside established search technologies, to enable quick access to relevant information and knowledge from large digital sources that are complex and time-consuming for humans to navigate. Advancements in large language models (LLMs) have revolutionized the field of Q&A, with models like GPT-3 (Brown et al. 2020), BERT (Devlin et al. 2018), and RoBERTa (Liu et al. 2019) demonstrating remarkable abilities in understanding and generating human-like text. However, the effectiveness of such models in handling domain-specific questions that require specialized knowledge is limited. Retrieval-augmented generation (RAG) techniques, which combine information retrieval and generative models (Lewis et al. 2021), have shown promise in boosting the quality of LLM output in Q&A tasks. RAG systems leverage the strengths of both retrieval and generation components to provide contextually relevant and informative responses. While there is a lack of established quantification of RAG accuracy, early findings suggest that generic RAG does not perform well in complex domains such as finance. In one instance, RAG based on generic LLMs such as GPT-4-Turbo fails to answer 81% of the questions derived from Securities and Exchange Commission (SEC) financial filings (Islam et al. 2023). Aitomatic, Inc. (except as noted, all authors are from Aitomatic)


A RAG-based Question Answering System Proposal for Understanding Islam: MufassirQAS LLM

arXiv.org Artificial Intelligence

Challenges exist in learning and understanding religions, such as the complexity and depth of religious doctrines and teachings. Chatbots as question-answering systems can help in solving these challenges. LLM chatbots use NLP techniques to establish connections between topics and accurately respond to complex questions. These capabilities make it perfect for enlightenment on religion as a question-answering chatbot. However, LLMs also tend to generate false information, known as hallucination. Also, the chatbots' responses can include content that insults personal religious beliefs, interfaith conflicts, and controversial or sensitive topics. It must avoid such cases without promoting hate speech or offending certain groups of people or their beliefs. This study uses a vector database-based Retrieval Augmented Generation (RAG) approach to enhance the accuracy and transparency of LLMs. Our question-answering system is called "MufassirQAS". We created a database consisting of several open-access books that include Turkish context. These books contain Turkish translations and interpretations of Islam. This database is utilized to answer religion-related questions and ensure our answers are trustworthy. The relevant part of the dataset, which LLM also uses, is presented along with the answer. We have put careful effort into creating system prompts that give instructions to prevent harmful, offensive, or disrespectful responses to respect people's values and provide reliable results. The system answers and shares additional information, such as the page number from the respective book and the articles referenced for obtaining the information. MufassirQAS and ChatGPT are also tested with sensitive questions. We got better performance with our system. Study and enhancements are still in progress. Results and future works are given.


Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty, and it is progressively expanded with validated knowledge. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.